Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Geometric and Visual Features Fusion for Action Recognition

Participants : Adlen Kerboua, Farhood Negin, François Brémond.

keywords: skeleton, geometric features, visual features, CNN.

Figure 28. The data flow of our framework.
IMG/adlen-figure1.jpg

The proposed activity recognition system consists in the fusion of two types of features: geometric features and visual features. The geometric features are computed from the 2D/3D skeleton joints that represent the articulations of the human body, which can be extracted by modern methods of pose estimation from RGB/RGB-D images, such as DeepCut pose estimation [107] or OpenPose [51]. The visual features are extracted by a convolutional neural network CNN from RGB patches representing both hands, the main steps of this method are illustrated in Figure 28.

Geometric features

Before computing the geometrical features, we apply several preprocessing steps on the skeleton joints:

To get the geometric features, we compute Euclidean distances and spherical coordinates (angles) between every skeleton joints pair-wise belonging to the same frame and adjacent frames.

Figure 29. The data flow of our framework.
IMG/adlen-figure2.jpg

Visual features

After the extraction of the skeletal joints during the precedent phase we can also extract the RGB patches representing the parts of the body most used to perform actions, which are in most cases the two hands, then we use several types of CNN (VGG16, VGG19, Resnet, ...) to extract the visual features. Different from our preliminary work of gesture recognition in the PRAXIS project, we merge those features with the geometrical one by concatenation to get the final descriptor used for classification of activities. During the test phase, a cross-subject test protocol is used to reduce the overfitting phenomenon and to obtain solutions that can be generalized.

Experiments

In the experimentation phase, we choose several datasets, including some properties of the STARS team, such as the PRAXIS dataset, this innovative test battery conducted on people with Alzheimer's disease is very useful to evaluate the evolution of this disease. This dataset contains more than 29 gestures divided between static and dynamic gestures, repeated several times by 58 patients, and contained a total of 3227 gestures performed in a correct manner and others in an inconsistent way. STARS also uses the most popular public datasets in the scientific community in order to evaluate the proposed methods compared to the current state of the art, such as NTU RGB+D [114] and ChaLearn [124].